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•VIDEO ENCODING METHOD" 



FIELD OF THE INVENTION 

The present invention generally relates to the field of data compression 
and, more specifically, to a method of encoding a sequence of frames, composed of 
picture elements (pixels), by means of a three-dfmensfonal (3D) subband decomposition 
Involving a filtering step applied, In the sequence considered as a 3D volume, to the 
spatial-temporal data which correspond in said sequence to each one of successive 
groups of frames (GOFs), these GOFs being themselves subdivided into successive pairs 
of frames (POFs) including a so-called previous frame and a so-called current frame, 
said decomposition being applied to said GOFs together with motion estimation and 
compensation steps performed in each GOFon saids POFs and on corresponding pairs 
of low-frequency temporal subbands (POSs) obtained at each temporal decomposition 
level. 

The invention also relates to a computer programme comprising a set of 
instructions for the implementation of said encoding method, when said programme is 
carried out by a processor included in an encoding device. 

BACKGROUND OF THE INVENTION 

In recent years, three-dimensional (3D) subband analysis, based on a 3D, 
or (2D+t), wavelet decomposition of a sequence of frames considered as a 3D volum 
has been more and more studied for video compression. The coefficients generated by 
the wavelet transform constitute a hierarchical pyramid in which the spatio-temporal 
relationship is defined thanks to 3D orientation trees evidencing the parent-offepring 
dependencies between coefficients, and the in-depth scanning of the generated 
coefficients in the hierarchical trees and a progressive bitplane encoding technique lead 
to a desired quality scalability. The practical stage for this approach is to generate 
motion compensated temporal subbands using a simple two taps wavelet filter, as 
illustrated in Fig.l for a GOF of eight frames. 

In the illustrated implementation, the input video sequence is divided into 
Groups of Frames (GOFs), and each GOF, itself subdivided Into successive couples of 
frames (that are as many inputs for a so-called Motion-Compensated Temporal Filtering, 
or MCTF module), is first motion-compensated (MC) and then temporally filtered (TF). 
The resulting low frequency (L) temporal subbands of the first temporal decomposition 
level are further filtered (TF), and the process may stop after an arbitrary number of 
decompositions resulting in one or more low frequency subbands called root temporal 
subbands (in the illustration, an example with two decomposition levels resulting In two 
root subbands LL is presented). In the example of Fig.l, the frames of the Illustrated 
group are referenced Fl to F8, and the dotted arrows correspond to a high-pass 
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stages of decomposition are shown (L and H = first stage ; LL and LH = second stage). 
At each temporal decomposition level of the illustrated group of 8 frames, a group of 
motion vector fields is generated (in the present example, MV4 at the first level, MV3 at 
the second one). 

-Wher«-Haar-muiai^olutnon^nalysis-isHised-for-the-temporal- 



decompositlon, since one motion vector field Is generated between every two frames in 
the considered group of frames at each temporal decomposition level, the number of 
motion vector fields is equal to half the number of frames in the temporal subband, i.e. 
four at the first level of motion vector fields and two at the second one. Motion 
estimation (ME) and motion compensation (MC) are only performed every two frames 
of the Inputsequence (generally in the forward way), due to the temporal down- 
sampling by two of the simple wavelet filter. Using these very simple filters, each low 
frequency temporal subband (L) represents a temporal average of the input couples of 
frames, whereas the high frequency one (H) contains the residual error after the MCTF 
step. 

Unfortunately, the motion compensated temporal filtering may raise the 
problem of unconnected picture elements (or pixels), which are not filtered at all (or 
also the problem of double-connected pixels, which are filtered twice). The number of 
unconnected pixels represents a weakness of a 3D subband codec approaches because 
it highly impacts the resulting picture quality (particularly in occlusion regions). It is 
especially true for high motion sequences or for final temporal decomposition levels, 
where the temporal correlation is not good. The number of these unconnected pixels 
depends on the dense motion vector field that has been generated by the motion 
estimation. 

Current criteria for optimal motion vector search used In motion estimators 
do not take into account the number of unconnected pixels that will be the result of 
motion compensation. Most sophisticated algorithms use a rate/distortion criterion 
which tends to minimize a cost function that depends on the displaced difference 
energy (distortion) and the number of bits spent to transmit the motion vector (rate). 
For example, the motion search returns the motion vector that minimises : 

J(m) = SAD(s, c(m)) + A MO „ 0N . R( m -p) (i) 



with m = (rn x ,m y f being the motion vector, p = (p x7 p y f being the prediction for the 
motion vector, and X uolJON being the Lagrange multiplier. The rate term R(m~p) 
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represents the motion information only and SAD is used as distortion measure. It is 
computed as : 



SAD(s,c(m))= *Z\s[x,y]-c[x-m x ,y-m y ]\ (2) 



with s being the original video signal, c being the coded vfdeo signal and B being the 
block size (note that B can be 1). 
5 Unfortunately, these algorithms do not take into account the distortion 

Introduced by unconnected pixels during the inverse motion compensation because usually 
these optimizations are applied to hybrid coding for which the inverse motion 
compensation is not performed. A previous European patent application n°02293062.2 
(PHFR020136), filed by the applicant on December 11, 2002, has then proposed a solution 

10 for avoiding this drawback. Said solution, in which the set of unconnected pixels is now 

taken into account in the distortion measure, relates to a method of encoding a sequence 
of frames, composed of picture elements (pixels), by means of a three-dimensional (3D) 
subband decomposition involving a filtering step applied, in the sequence considered as a 
3D volume, to the spatial-temporal data which correspond in said sequence to each one of 

15 the successive GOFs. These GOFs are themselves subdivided into successive pairs of 

frames (POFs) including a so-called previous frame and a so-called current frame, said 
decomposition being applied to the GOFs together with motion estimation and 
compensation steps performed in-each GOF on saids POFs and on corresponding pairs of 
low-frequency temporal subbands (POSs) obtained at each temporal decomposition level. 

20 The process of motion compensated temporal filtering leads in the previous frames on the 

one hand to connected pixels, that are filtered along a motion trajectory corresponding to 
motion vectors defined by means of said motion estimation steps, and on the other hand to 
' a residual number of so-called unconnected pixels, that are not filtered at all. Each motion 
/estimation step then comprises a motion search provided for returning a motion vector that 

25 minimizes a cost function depending at least on a distorsion criterion involving a distortion 

measure, said measure distorsion being also applied to the set of said unconnected pixels. 

More precisely, for taking into account the set of unconnected pixels in the 
distortion measure, it has been proposed to introduce a new rate/distortion criterion that * 
extends equation taking into account the unconnected pixels phenomenon. This Is 

30 illustrated in equations (3) and (4) : 

/T(m) ~ J(m) + ^UNCONNECTED ' D ($UNCONNECTW (»*)) (3) 

K (m) = SAD(s, c(m)) + Unconnected * ^unconnected (m)) + % motion • *( m - P) (4) 

with D(S mcomJSCJTD (m)) being the distortion measure for the set S UNCOmECTED of 
unconnected pixels resulting from motion vector m . Several distortion measures can be 
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unconnected pixels for the motion vector under study. Nevertheless, the real set of 
unconnected pixels resulting from a motion search can be computed only when the motion 
vectors information is available for the whole frame, and an optimal solution can hardly be 
achievable. 



A^ab^ptimal-fmplernentatton-has-tireT.-bB&n proposed tn - th -e-g^-egcumenT 



and it Is here recalled. For a given part of the image to be motion compensated (a part of 
the image can be a pixel, a block of pixels , a macroblodc of pixels or any region provided 
that the set of parts covers the whole Image without any overlapping) and for a given 
motion vector candidate m, a temporary Inverse motion compensation is applied, the set 
of unconnected pixels is Identified and D(S mcomEcmD ( m) ) ^ be evaluated . ^ 
current K(m) value can be computed and compared to the current minimum value 
K^im) to check if the candidate motion vector brings a lower K(m) value. When all 
the candidate have been tested, the (final) inverse motion compensation is applied to the 
best candidate (identifying connected and unconnected pixels). The next part of the image 
can then be processed, and so on up to a complete processing of the whole image. 

However, in this non-recursive implementation, the resulting decisions are not 
spatially homogeneous over the whole Image : for the first part of the image to be motion 
compensated, the set of unconnected pixels is empty, while the probability of unconnected 
p.xels for the last part of the image to be motion compensated is very high. This situation 
can lead to heterogeneous spatial distorsions. 

SUMMARY OF THE INVENTION 

Itls therefore an object of the invention to avoid such a drawback and to 
propose a video encoding method in which the problem of heterogeneous treatment 
resulting from the single-pass implementation recalled above is discarded or at least 
reduced. 

To this end, the Invention relates to a method of encoding a sequence of 
frames, composed of picture elements (pixels), by means of a three-dimensional (3D) 
subband decomposition involving a filtering step applied, in the sequence considered as a 
3D volume, to the spatial-temporal data which correspond in said sequence to each' one of 
successive groups of frames (GOFs), these GOFs being themselves subdivided into 
successive pairs of frames (POFs) including a so-called previous frame and a so-called 
current frame, said decomposition being applied to said GOFs together with motion 
estimation and compensation steps performed in each GOF on saids POFs and on 
corresponding pairs of low-frequency temporal subbands (POSs) obtained at each temporal 
decomposition level, this process of motion compensated temporal filtering leading in the 
previous frames on the one hand to connected pixels, that are filtered along a motion 
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trajectory corresponding to motion vectors defined by means of said motion estimation 
steps, and on the other hand to a residual number of so-called unconnected pixels, that are 
not filtered at all, each motion estimation step comprising a motion search provided for 
returning a motion vector that minimizes a cost function depending at least on a distorsion 
criterion involving a distortion measure, said measure distorsion being moreover applied to 
the set of said unconnected pixels according to the measures and rules defined in claims 2 
and 3. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now-be described, by way of example, with reference to the 
accompanying drawing in which : 

- Rg.l shows a temporal multiresolution analysis with motion compensation. 

DETAILED DESCRIPTION OF THE INVENTION 

In order to discard the problem of heterogeneous spatial distortions observed 
with the previous implementation, it is now proposed to minimize the global 
criterion ^ K(m) for all parts of the whole image, which can be done with a multiple- 
pass implementation including the following steps. 

First, for all the parts of the image, the optimal motion, vector m opt is 

computed, as well as a set of sub-optimal motion vectors. {m sub . opt } that provide 

the minimum values for J(m) of equation (1) (the number of unconnected pixels is not 
used at this stage). The number of sub-optimal vectors N suA _ opt is implementation 
dependent. For all these vectors, the corresponding value for the criterion J(m) is stored 
so that J(m opt ) and {/(m^.^ )} are generated. Then an inverse motion compensation 
. Is applied for the optimal motion vectors so that ]T K(m opt ) can be computed 

at I parts 

(note that ^ K(m Qpt ) is n °t the optimal value for K(m) because m opt is 

ail parts all parts 

optimizing J(m) and not £*(m)). From the list of sub-optimal vectors, the candidate 
motion vector m^date minimizing \{j(m opt )}- £/(m MMlldate )}| is then selected (note 

that m C8ndWate can be a vector of any part of the current image). For the set of optimal 

motion vectors and the candidate vector (in place of the optimal vector for the 

corresponding part of the image), an inverse motion compensation is applied 

weti 2 ^(m) is again computed. If its value is lower than ]T K(m opt ) , the optimal 

all ports all} 
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candidate 

is discarded from the list of sub-optimal vectors. Then a new candidate is 
selected and the same mechanism is applied until the list of sub-optimal vectors is empty, 
in order to obtain the optimal set of motion vectors. 
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CLAIMS : 

1. A method of encoding a sequence of frames, composed of picture elements 
(pixels), by means of a three-dimensional (3D) subband decomposition involving a filtering 
step applied, in the sequence considered as a 3D volume, to the spatial-temporal data 
which correspond in said sequence to each one of successive groups of frames (GOFs), 
these GOFs being themselves subdivided into successive pairs of frames (POFs) including a 
so-called previous frame and a so-called current frame, said decomposition being applied 
to said GOFs together with motion estimation and compensation steps performed in each 
GOF on saids POFs and on corresponding pairs of low-frequency temporal subbands (POSs) 
obtained at each temporal decomposition level, this process of motion compensated 
temporal filtering leading in the previous frames on the one hand to connected pixels, that 
are filtered along a motion trajectory corresponding to motion vectors defined by means of 
said motion estimation steps, and on the other hand to a residual number of so-called 
unconnected pixels, that are not filtered at all, each motion estimation step comprising a 
motion search provided for returning a motion vector that minimizes a cost function 
depending at least on a distorsion criterion involving a distortion measure, said measure 
distorsion being also applied to the set of said unconnected pixels. 

2. An encoding method according to claim 1, in which said motion search is 
provided for minimizing the following expression (1) : 

J(m) = SAD(s, c(m)) + X MOJJON • R(m -p) (1) 

with m = (jn x> m y ) T being the motion vector, p = (jp x , p y ) T being the prediction for the 
motion vector, ,X MOTiON being the Lagrange multiplier, the rate term 12(m-p) 

representing the motion information only, and SAD used as distortion measure being 
computed as : 

B B 

SAD(s,c{m)) = %\Jx.y]-(fc-m x .y-M,]\ (2) 

with s being the original video signal, c being the coded video signal and B being the 
block size, and in which the distorsion criterion extends equation (1), taking into account 
the unconnected pixels phenomenon for the minimizing operation that is now applied to 
the following expression (3) : 

K (m) = J(m) + ^unconnected * unconnected ( m )) ( 3 ) 
or K{m) = SAD{s,c{m)) + X UNCONN£aB> *D{S mcom &m{m))+ X MOnoN -/?(m-p) (4) 




10 



15 



20 



8 

PHFR020140 EPp 

-wtth-£)(y wcojW£C7gg (m-))H3eingiftedistortton^ o f 

unconnected pixels resulting from the motion vector m . 

3. An encoding method according to claim 2, in which, for taking into account 

the distortion due to the unconnected pixels, the global criterion £ [all parts]K(m) is 
minimized for the who leJirageJboJ3,e_ra^^ 



25 



(a) the optimal motion vector m opt is computed, as well as a set of N^,* sub- 
optimal motion vectors {nr^t} that provide the minimum values for 
Km) ; 

(b) for all these vectors, the. corresponding value for the criterion J(m) Is 
stored, In order to generate J(m op t) and {JCnWopJ ; 

(c) an inverse motion compensation is applied for the optimal motion vectors 
mopt, in order to compute £ [all parts] K(m op 0 ; 

(d) from the list of sub-optimal vectors, the candidate motion vector ni^ 
minimizing |{j(m opt }- {J(m awlWate )} | fe selected; 

(e) for the set of optimal motion vectors and the candidate vector, an inverse 
motion compensation is applied, in order to compute again £ [all parts] K(m) ; 

(f) if the value of £ [all parts] K(m) is lower than £ [all parts] KCrn^), 
the optimal value of nv is replaced by for the corresponding part of the image ; 

(g) finally, mammta* is discarded from the list of sub-optimal vectors ; 

(h) a new candidate is selected, and the same mechanism is applied until the 
list of sub-optimal vectors is empty, in order to obtain the optimal set of motion vectois. 
4. A computer programme comprising a set of instructions for the 
implementation of a method according to claim 3, when said programme is carried out by a 
processor included in an encoding device. 
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Abstract 

The invention relates to a method of encoding a sequence of frames by means 
of a three-dimensional (3D) subband decomposition involving a filtering step applied to the 
spatial-temporal data corresponding to successive groups of frames (GOFs), and to a 
recursive implementation of said method. The GOFs are subdivided into successive pairs of 
frames (POFs), and the decomposition is applied to said GOFs together with motion 
estimation and compensation steps performed on saids POFs and on corresponding pairs of 
low-frequency temporal subbands (POSs) obtained at each temporal decomposition level. 
The process of motion compensated temporal filtering leading in the previous frames on 
the one hand to connected pixels, that are filtered, and on the other hand to a residual 
number of unconnected pixels, that are not filtered, each motion estimation step comprises 
a motion search provided for returning a motion vector that minimizes a cost function 
depending at least on a dlstorsion criterion, said criterion taking Into account the 
unconnected pixels phenomenon for the minimizing operation, itself based on specific rules 
allowing to obtain the optimal set of motion vectors. 
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